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Abstract 

Much research in program optimization has focused on for- 
mal approaches to correctness: proving that the meaning 
of programs is preserved by the optimisation. Paradoxi- 
cally, there has been comparatively little work on formal 
approaches to efficiency: proving that the performance of op- 
timized programs is actually improved. This paper addresses 
this problem for a general-purpose optimization technique, 
the worker /wrapper transformation. In particular, we use 
the call-by-need variant of improvement theory to establish 
conditions under which the worker /wrapper transformation 
is formally guaranteed to preserve or improve the time per- 
formance of programs in lazy languages such as Haskell. 

Categories and Subject Descriptors D.I.I [Program- 
ming Techniques]: Applicative (Functional) Programming 

Keywords general recursion; improvement 
1. Introduction 

To misquote Oscar Wilde fjlt , "functional programmers 
know the value of everything and the cost of nothing "El. 
More precisely, the functional approach to programming 
emphasises what programs mean in a denotational sense, 
rather than what programs do in terms of their operational 
behaviour. For many programming tasks this emphasis is 
entirely appropriate, allowing the programmer to focus on 
the high-level description of what is being computed rather 
than the low-level details of how this is realised. However, 
in the context of program optimisation both aspects play 
a central role, as the aim of optimisation is to improve 
the operational performance of programs while maintaining 
their denotational correctness. 

A research paper on program optimisation therefore 
should justify both the correctness and performance aspects 
of the optimisation described. There is a whole spectrum of 
possible approaches to this, ranging from informal tests and 
benchmarks J19|, to tool-based methods such as property- 



1 The general form of this misquote is due to Alan Perlis, who 
originally said it of Lisp programmers. 
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based testing [|3| and space/time prof iling p4| , all the way 
up to formal mathematical proofs [17]. For correctness, it is 
now becoming standard to formally prove that an optimisa- 
tion preserves the meaning of programs. For performance, 
however, the standard approach is to provide some form of 
empirical evidence that an optimisation improves the effi- 
ciency of programs, and there is little published work on 
formal proofs of improvement. 

In this paper, we aim to go some way toward redress- 
ing this imbalance in the context of the worker /wrapper 
transformation M , putting the denotational and operational 
aspects on an equally formal footing. The worker /wrapper 
transformation is a general purpose optimisation technique 
that has already been formally proved correct, as well as 
being realised in practice as an extension to the Glasgow 
Haskell Compiler [Pq| . In this paper we formally prove that 
this transformation is guaranteed to preserve or improve 
time performance with respect to an established operational 
theory. In other words, we show that the worker /wrapper 
transformation never makes programs slower. Specifically, 
the paper makes the following contributions: 



• We show how Moran and Sands' work on call-by-need im- 
provement theory |15| can be applied to formally justify 
that the worker/wrapper transformation for least fixed 
points preserves or improves time performance; 

• We present preconditions that ensure the transformation 
improves performance in this manner, which come natu- 
rally from the preconditions that ensure correctness; 

• We demonstrate the utility of the new theory by verify- 
ing that examples from previous worker /wrapper papers 
indeed exhibit a time improvement. 



The use of call-by-need improvement theory means that 
our work applies to lazy functional languages such as 
Haskell. Traditionally, the operational beheaviour of lazy 
evaluation has been seen as difficult to reason about, but 
we show that with the right tools this need not be the case. 
To the best of our knowledge, this paper is the first time that 
a general purpose optimisation method for lazy languages 
has been formally proved to improve time performance. 

Improvement theory does not seem to have attracted 
much attention in recent years, but we hope that this pa- 
per can help to generate more interest in this and other 
techniques for reasoning about lazy evaluation. Whereas in 
many papers calculations and proofs are often omitted or 
compressed for reasons of brevity, in this paper they are the 
central focus, so are presented in detail. 
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2. Example: Fast Reverse 

We shall begin with an example that motivates the rest of 
the paper: transforming the naive list reverse function into 
the so-called "fast reverse" function. This transformation 
is an instance of the worker/wrapper transformation, and 
there is an intuitive, informal justification of why this is an 
optimisation. Here we give this non-rigorous explanation; 
the remainder of this paper will focus on building the tools 
to strengthen this to a rigorous argument. 

We start with a naive definition of the reverse function, 
which takes quadratic time to run as each append 4f takes 
time linear in the length of its left argument: 

reverse :: [a] — > [a] 

reverse [] — [] 

reverse (x : xs) — reverse xs-W- [x] 

We can write a more efficient version by using a worker 
function revcat with a wrapper around it that simply applies 
the worker function with [] as the second argument: 

reverse' :: [a] —¥ [a] 
reverse' xs = revcat xs [] 

The specification for the worker revcat is as follows: 

revcat :: [a] — ¥ [a] — > [a] 
revcat xs ys — reverse xs -H- ys 

From this specification we can calculate a new definition 
that does not depend on reverse. Because reverse is defined 
by cases, we will have one calculation for each case. 

Case for []: 

revcat [] ys 

— { specification of revcat } 
reverse [ ] -ff ys 

— { definition of reverse } 

= { definition of 4f } 
ys 

Case for (x : xs): 

revcat (x : xs) ys 

— { specification of revcat } 
reverse (x : xs) 4f ys 

— { definition of reverse } 
(reverse xs -H- [x]) -H- ys 

— { associativity of -ff } 
reverse xs -H- ([x] -H- ys) 

— { definition of 4f } 
reverse xs 4f (x : ys) 

— { specification of revcat } 
revcat xs (x : ys) 

Note the use of associativity of 4f in the third step, which 
is the only step not simply by definition or specification. 
Left-associated appends such as (xs 4f ys) 4f zs are less 
time-efficient than the equivalent right-associated appends 
xs-H- (ys-tf-zs), as the former traverses xs twice. The intuition 
here is that the efficiency gain from this step in the proof 
carries over in some way to the rest of the proof, so that 
overall our calculated definition of revcat is more efficient 
than its original specification. The calculation gives us the 
following definition, which runs in linear time: 



reverse xs = revcat xs [ ] 

revcat [] ys = ys 

revcat (x : xs) ys = revcat xs (x : ys) 

Unfortunately, there are a number of problems with this 
approach. Firstly, we calculated revcat using the fold-unfold 
style of program calculation This is an informal calcu- 
lation, which fails to guarantee total correctness. Thus the 
resulting reverse function may fail in some cases where the 
original succeeded. Secondly, while we are applying the com- 
mon pattern of factorising a program into a worker and a 
wrapper, the reasoning we use is ad-hoc and does not take 
advantage of this. We would like to abstract out this pattern 
to make future applications of this technique more straight- 
forward. Finally, while intuitively we can see an efficiency 
gain from the use of associativity of 4f, this is not a rigor- 
ous argument. Put simply, we need rigorous proofs of both 
correctness and improvement for our transformation. 

3. Worker /Wrapper Transformation 

The worker/ wrapper transformation, as originally formu- 
lated by Gill and Hutton M , allowed a function written using 
general recursion to be split into a recursive worker function 
and a wrapper function that allows the new definition to be 
used in the same contexts as the original. The usual applica- 
tion of this technique would be to write the worker to use a 
different type than the original program that supports more 
efficient operations, thus hopefully resulting in a more effi- 
cient program overall. Gill and Hutton gave conditions for 
the correctness of the transformation; here we present the 
more general theory and correctnessscpnditions recently de- 
veloped by Sculthorpe and Hutton |j25| . 

3.1 The Fix Theory 

The idea of the worker/ wrapper transformation for fixed- 
points is as follows. Given a recursive program prog of some 
type A, we can write prog as some function / of itself: 

prog :: A 
prog = fprog 

We can rewrite this definition so that it is explicitly written 
using the well-known fixpoint operator fix: 

fix :: (a — S> a) — > a 
fix /(fix/) 

resulting in the following definition: 

prog = fix / 

Next, we write functions abs :: B — > A and rep :: A — > B that 
allow us to convert from the original type A to some other 
type B that supports more efficient operations. We finish by 
constructing a new function g : B — > B that allows us to 
rewrite our original definition of prog as follows: 

prog = abs (fix g) 

Here abs is the wrapper function, while fix g is the worker. 
The pattern of the worker/wrapper transformation can be 
captured by a theorem that expresses_.necessary and suffi- 
cient conditions for its correctness |E5[. This theorem has 
assumptions that express the required relationship between 
the functions abs and rep, and conditions that provide a 
specification for the function g in terms of abs, rep and /: 
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Theorem 1 (Worker/ Wrapper Factorisation). 

Given 

abs : B -> A f : A ^ A 
rep : A -5> B g : B ->• B 

satisfying one of the assumptions 

(A) abs o rep = id a 

(B) abs o rep of = / 

(C) fix (abs o rep o _/) = fix / 

and one of the conditions 

(1) g — rep o fo abs (1/3) fix g = fix (rep o /o abs) 

(2) g o rep — rep o f (2/3) fix g = rep (fix /) 
^ /o afcs = ofes o g 

we have the factorisation 
fix / = abs (fix g) 

The different assumptions and conditions allow one to 
choose which will be easiest to verify. 

3.2 Proving Fast Reverse Correct 

Recall once again the naive definition of reverse: 



reverse 
reverse [] 
reverse (x 



reverse xs -W \x\ 



As we mentioned before, this naive implementation is inef- 
ficient due to the use of the append operation 4f • We would 
like to use worker /wrapper factorisation to improve it. The 
first step is to rewrite the function using fix: 



fix rev 
([<*]-> 



a]) -> ([a] -> [a]) 



rev 

rev r [ ] = 
rev r (x : xs) = r xs -ff [x] 

The next step in applying worker/wrapper is to select 
a new type to replace the original type [a] — > [a], and to 
write abs and rep functions to perform the conversions. We 
can represent a list xs by its difference list Xys — > xs -W ys, 
as first demonstrated by Hughes [jl2j. Difference lists have 
the advantage that the usually costly operation of 4f can be 
implemented with function composition, typically leading to 
an increase of efficiency. We write the following functions to 
convert between the two representations: 

type DiffList a= [a] — > [a] 



:: [a] — > DiffList a 
= Xys — > xs -W ys 
:: DiffList a — > [a] 
= h[) 

id: 



toDiff 
toDiff xs 

fromDiff 
fromDiff h 

We have fromDiff o toDiff 

fromDiff (toDiff xs) 

— { definition of toDiff } 
fromDiff (Xys — > xs 4f ys) 

— { definition of fromDiff } 
(Xys — s> xs 4f ys) [ ] 

— { /3-reduction } 
xs-W- [] 

— { [] is identity of -ff } 



From these functions it is straightforward to create the 
actual abs and rep functions. These convert between the 
original function type [a] — > [a] and a new function type 
[a] — > DiffList a where the returned value is represented as 
a difference list, rather than a regular list: 



rep :: ([a] 
rep h = toDiff o h 



a]) — > ([a] — > DiffList a) 



abs :: ([a] —t DiffList a) 
abs h — fromDiff o h 

Assumption (A) holds trivially: 

abs (rep h) 

— { definitions of abs and rep } 
fromDiff o toDiff o h 

— { fromDiff o toDiff = id } 
h 

Now we must verify that the definition of revcat that we 
calculated in the previous section 

revcat [] ys = ys 

revcat (x : xs) ys = revcat xs (x : ys) 

satisfies one of the worker/wrapper conditions. We first 
rewrite revcat as an explicit fixed point. 

revcat = fix rev' 

rev' h [] ys = ys 

rev' h (x : xs) ys — h xs (x : ys) 

We now verify condition (2), rev o rep — rep o rev, which 
expands to rev (rep r) xs — rep (rev r) xs. We calculate 
from the right-hand side, performing case analysis on xs. 
Firstly, we calculate for the case when xs is empty: 

rep (rev r) [] 
= { definition of rep } 
toDiff (rev r []) 

— { definition of rev } 
toDiff [] 

— { definition of toDiff } 
Xys^ [] -Vrys 

— { [] is identity of -ff } 
Xys — > ys 

— { definiton of rev' } 
rev (rep r) [] 

and then for the case where xs is non-empty: 

rep (rev r) (x : xs) 

— { definition of rep } 
toDiff (rev r (x : xs)) 

— { definition of rev } 
toDiff (r xs-W [x]) 

— { definition of toDiff } 
Ays — > (r xs-W [x]) -W ys 

— { associativity and definition of -W } 
Xys — > r xs -W (x : ys) 

— { definition of toDiff } 
Xys — > toDiff (r xs) (x : ys) 

= { definition of rep } 

Xys — > rep r xs (x : ys) 
= { definition of rev } 

rev' (rep r) (x : xs) 

For total correctness on infinite lists we must also verify the 
condition holds for the undefined value _L: 
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rep (rev r) _L 
= { definition of rep } 

toDiff (rev r _L) 
= { rev pattern matches on second argument } 

toDiff ± 

— { definition of toDiff } 
Xys — > _L 4f ys 

— { -ff strict in first argument } 
Xys — > _L 

= { rev pattern matches on second argument } 
rev' (rep r) _l_ 

Now that we know our rev' satisfies condition (2), we have 
a new definition of reverse 

reverse = abs revcat = fromDiff o revcat 

which eta-expands as follows: 

reverse xs = revcat xs [ ] 

revcat [] ys = ys 

revcat (x : xs) ys = revcat xs (x : ys) 

The end result is the same improved definition of reverse we 
had before. Thus the worker/wrapper theory has allowed us 
to formally verify the correctness of our earlier transforma- 
tion. Furthermore, the use of a general theory has allowed 
us to avoid the need for induction which would usually be 
needed to reason about recursive definitions. 

4. Improvement Theory 

Thus far we have only reasoned about correctness. In or- 
der to develop a worker/wrapper theory that can prove effi- 
ciency properties, we need an operational theory of program 
improvement. More than just expressing extensional infor- 
mation, this should be based on intensional properties of 
resources that a program requires. For the purpose of this 
paper, the resource we shall consider is execution time. 

We have two main design goals for our operational theory. 
Firstly, it ought to be based on the operational semantics 
of a realistic programming language, so that conclusions 
we draw from it are as applicable as possible. Secondly, 
it should be amenable to techniques such as (in)equational 
reasoning, as these are the techniques we used to apply the 
worker /wrapper correctness theory. 

For the first goal, we use a language with similar syntax 
and semantics to GHC Core, except that arguments to 
functions are required to he a tomic as was the case in earlier 
versions of the language [20]. (Normalisation of the current 
version of GHC Core into this form is straightforward.) The 
language is call-by-need, reflecting the use of lazy evaluation 
in Haskell. The efficiency behaviour of call- by- need programs 
is notoriously counterintuitive. Our hope is that providing 
formal techniques for reasoning about call-by-need efficiency 
we will go some way toward easing this problem. 

For the second goal, our theory must be based around 
relation 7? that is a preorder, as transitivity and reflexivity 
are necessary for inequational reasoning to be valid. Fur- 
thermore, to support reasoning in a compositional manner, 
it is essential to allow substitution. That is, given terms M 
and N, if M R N then C[M] R C[N] should also hold 
for any context C. A relation R that satisfies both of these 
properties is called a precongruence. 

A naive approach to measuring execution time would be 
to simply count the number of steps taken to evaluate a 
term to some normal form, and consider that a term M 



is more efficient than a term N if its evaluation finishes 
in fewer steps. The resulting relation is clearly a preorder; 
however it is not a precongruence in a call-by-need setting, 
because meaningful computations can be done with terms 
that are not fully normalised. For example, just because M 
normalises and N does not, it does not follow that M is 
necessarily more efficient in all contexts. 

The approach we use is due to Moran and Sands |l5| . 
Rather than counting the steps taken to normalise a term, 
we compare the steps taken in all contexts, and only say 
that M is improved by N if for any context C, the term 
C[M] requires no more evaluation steps than the term C[iV]. 
The result is a relation that is trivially a precongruence: 
it inherits transitivity and reflexivity from the numerical 
ordering ^, and is substitutive by definition. 

Improvement theory j23j was originally developed for 
call- by-name languages by Sands [|2l| . The remainder of this 
section presents the call-hjijneed time improvement theory 
due to Moran and Sands tUx, which will provide the setting 
for our operational worker/wrapper theory. The essential 
difference between call-by-name and call-by-need is that the 
latter implements a sharing strategy, avoiding the repeated 
evaluation of terms that are used more than once. 

4.1 Operational Semantics of the Core Language 

We shall begin by presenting the operational model that 
forms the basis of this improvement theory. JIhe semantics 
presented here are originally due to Sestoft [I27J. 

We start from a set of variables Var and a set of construc- 
tors Con. We assume all constructors have a fixed arity. The 
grammar of terms is as follows: 

x,y, z 6 Var 
c 6 Con 
M, N ::= x 

Xx M 

M x 

| let {x = M} in iV 

c x 

case M of { c; Xi — ¥ Ni } 

We use x = M as a shorthand for a list of bindings of the 
form x = M. Similarly, we use d x\ — » Ni as a shorthand 
for a list of cases of the form c x — > N. All constructors 
are assumed to be saturated, that is, we assume that any 
x that is the operand of a constructor c has length equal 
to the arity of c. Literals are represented by constructors of 
arity 0. We treat a-equivalent terms as identical. 

A term is a value if it is of the form c x or Xx — > M. In 
Haskell this is referred to as a weak head normal form. We 
shall use letters such as V, Wto denote value terms. 

Term contexts take the following form, with substitution 
defined in the obvious way. 

C,D ::=[-] 
x 

Ax-> C 
Ci 

| lot { x : ; J in :: 

C X 

case C of { a xl — > Dj } 

A value context is a context that is either a lambda abstrac- 
tion or a constructor applied to variables. 

The restriction that the arguments of functions and con- 
structors always be variables has the effect that all bindings 
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(T{x=M},x,S) (r, M, #x : S) {Lookup} 

(r, V,#x : S) ->• (r {x = V}, V, 5) { Update } 

(F, M x, S) — > (r, M,x : S) { Unwind } 

(V, Xx -> M, y : S) -> (r,M[y/x],S) { Subst } 

(r,case M of alts, S) -> (F, M, alts : 5) { Case } 

(r, Cj y,{c, xi -> N,} ■ S) -> (T,Nj [y / Xj],S) { Branch } 

(r,let {x = M} in N,S) (T {x = M},N, S) { Letrec } 



Figure 1. The call- by- need abstract machine 



made during evaluation must have been created by a let. 
Sometimes we will use M N (where N is not a variable) as 
a shorthand for let { x — N } in M x, where x is fresh. We 
use this shorthand for both terms and contexts. 

An abstract machine for executing terms in the language 
maintains a state (r, M, S) consisting of: a heap F, given by a 
set of bindings from variables to terms; the term M currently 
being evaluated; the evaluation stack S, given by a list of 
tokens used by the abstract machine. The machine works 
by evaluating the current term to a value, and then decides 
what to do with the value based on the top of the stack. 
Bindings generated by let constructs are put on the heap, 
and only taken off when performing a Lookup. A Lookup 
executes by putting a token on the stack representing where 
the term was looked up, and then evaluating that term to 
value form before replacing it on the heap. In this way, each 
binding is only ever evaluated at most once. The semantics 
of the machine is given in Figure |l|. Note that the Letrec 
rule assumes that x is disjoint from the domain of Y; if not, 
we need only a-rename so that this is the case. 

4.2 The Cost Model and Improvement Relations 

Now that we have a semantics for our model, we must 
devise a cost model for this semantics. The natural way 
to do this for an operational semantics is to count steps 
taken to evaluate a given term. We use the notation M\. n to 
mean the abstract machine progresses from the initial state 
(0, M, e) to some final state (F, V, e) with n occurences of the 
Lookup step. It is sufficient to count Lookup steps because 
the total number of steps is bounded by a linear function of 
the number of Lookup steps [|15j. Furthermore, we use the 
notation M^ n to mean that M\, m for some m ^ n. 

From this, we can define our improvement relation. We 
say that "M is improved by N", written M > N, if the 
following statement holds for all contexts C: 

C[A/]|"W C[7V]|^ m 

In other words, a term M is improved by a term N if N 
takes no more steps to evaluate than M in all contexts. 
That this relation is a congruence follows immediately from 
the definition, and that it is a preorder follows from the fact 
that ^ is itself a preorder. We sometimes write M <J N for 
N > M. If both M > N and M < N, we write M <t>N and 
say that M and N are cost- equivalent. 

For convenience, we define a "tick" operation on terms 
that adds exactly one unit of cost to a term: 

/ M = let { x — M } in x { where x is free in M } 

This definition for / M takes exactly two steps to evaluate 
to M: one to add the binding to the heap, and the other to 
look it up. Only one of these steps is a Lookup step, so the 
result is that the cost of evaluating the term is increased by 
exactly one. Using ticks allows us to annotate terms with in- 



dividual units of cost, allowing us to use rules to "push" cost 
around a term, making the calculations more convenient. 
We could also define the tick operation by adding it to the 
grammar of terms and modifying the abstract machine and 
cost model accordingly, but this definition is equivalent. We 
have the following law: ■/ M > M. 

The improvement relation > covers when one term is at 
least as efficient as another in all contexts, but this is a very 
strong statement. We use the notion of "weak improvement" 
when one term is at least as efficient as another within a 
constant factor. Specifically, we say M is weakly improved 
by N, written M fe AT, if there exists a linear function 
fix) = kx + c (where k, c ^ 0) such that the following 
statement holds for all contexts C: 

C[M]|"W C[N]i^ Hm) 

This can be read as "replacing M with N may make pro- 
grams worse, but cannot make them asymptotically worse". 
We use symbols <j and o for inverse and equivalence anal- 
ogously as for standard Improvement. 

Because weak improvement ignores constant factors, we 
have the following tick introduction/elimination law: 

M <> / M 

It follows from this that any improvement M > iV can be 
weakened to a weak improvement Af jg TV 7 where M 1 and N' 
denote the terms M and with all the ticks removed. 

The last notation we define is entailment, which is used 
when we have a chain of improvements that all apply with 
respect to a particular set of definitions. Specifically, where 
V — {x — V} is & list of bindings, we write: 

r h Mi > M 2 > . . . > M„ 

to mean: 

let T in Mi > let T in Ma >-••!> let T in M„ 
4.3 Selected Laws 

We finish this section with a selection of laws taken from jl5| . 
The first two are /3-reduction rules. The following cost equiv- 
alence holds for function application: 

(\x-> M) y<s> M [y / x] 

This holds because the abstract machine evaluates the left- 
hand-side to the right-hand-side without performing any 
Lookups, resulting the same heap and stack as before. Note 
that the substitution is variable-for-variable, as the grammar 
for our language requires that the argument to function 
application always be a variable. 

In general, where a term M can be evaluated to a term 
M' , we have the following relationships: 

M > M 
M' <\> M 
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The latter fact may be non-obvious, but it holds because 
evaluating a term will produce a constant number of ticks, 
and tick-elimination is a weak cost-equivalence. In this man- 
ner we can see that partial evaluation by itself will never save 
more than a constant-factor of time. 

The following cost equivalence allows us to substitute a 
variable for its binding. However, note that this is only valid 
for values, as bindings to other terms will be modified in the 
course of execution. We thus call this rule value- p. 

let {x = V, y = C[x] } in D[x] 

let {z= V,y = C[-/V\] inB[/V] 

The following law allows us to move let bindings in and 
out of a context when the binding is to a value. Note that 
we assume that x does not appear free in C, which can be 
ensured by Q-renaming, and that no free variables in V are 
captured in C. We call this rule value let- floating. 

C[let {x= V} in M] <t> let {x = V} in C[Af] 

We also have a garbage collection law allowing us to 
remove unused bindings. Assuming that x is not free in N 
or L, we have the following cost equivalence: 

let {x= M;y = N} in L <> let {j/= N} in L 

The final law we present here is the rule of improvement 
induction. Th e version that we present is stronger than the 
version in Jl5| , but can be obtained by a simple modification 
of the proof given there. For any set of value bindings F and 
context C, we have the following rule: 

r h m > /c[M] r h /c[jv] > jv 
r P m > n 

This allows us to prove an M > N simply by finding a 
context C where we can "unfold" M to /C[M] and "fold" 
/ C[N] to N. In other words, the following proof is valid: 

r h m 

> 

~ /C[M] 

> { hypothesis } 

~ /C[JV] 

> 

~ N 

In this way the technique^ .similar to proof principles 
such as guarded coinduction j4|, |28| . 

As a corollary to this law, we have the following law for 
cost-equivalence improvement induction. For any set of value 
bindings F and context C, we have: 

r h A/ <> /C[M] r h / C[N] <x> N 

r h m <> n 

The proof is simply to start from the assumptions and make 
two applications of improvement induction: first to prove 
M > N, and second to prove N > M. 

5. Worker/Wrapper and Improvement 

In this section, we prove a factorisation theorem for im- 
provement theory analogous to th e w orker /wrapper fac- 



5.1 Preliminary Results 

The first rule we prove is the rolling rule, so named because 
of its similarity to the rolling rule for least-fixed points. In 
particular, for any pair of value contexts F, G, we have the 
following weak cost equivalence: 

let {x = F[G[x]]} in G[x] «j> let {x— G[F[x]]} in x 

The proof begins with an application of cost-equivalence 
improvement induction. We let F = {x — W[SG[x]],y = 
G[/F[3/]]}, M = /G[x], N = y, C = G[/F[-]]. The 
premises of induction are proved as follows: 

F h M 

= { definitions } 

/G[x] 
<ll> { value-/3 } 
~/G[/F[/G[af]]] 
= { definitions } 

/C[M] 

and 

r h sc[N] 

= { definitions } 

/G[/F[ V ]] 
<> { value-/3 } 

y 

= { definitions } 
N 

Thus we can conclude F h M <t> N, or equivalently 
let r in M <J> let F in N. We expand this out and ap- 
ply garbage collection to remove the unused bindings: 

let {x = F[/G[x]]} in/G[x] <> let {y=G[/F[y]]} in y 

By applying a-renaming and weakening we obtain the de- 
sired result. The second rule we prove is letrec-fusion, anal- 
ogous to fixed-point fusion. For any value contexts F, G, we 
have the following implication: 

H[/F[x]] > G[/H[x]] 

let {x= ¥[x]} inU[x] let {x= G[x]} in x 

For the proof, we assume the premise and proceed by 
improvement induction. Let F = {x = F[x],y = G[y]}, 
M — /M[x], N — y, C = G. The premises are proved by: 

F h M 

= {by definitions } 

/H[x] 
<t> { value beta } 
~/e[/F[x]] 
> {by assumption } 
~/G[/H[a;]] 
= { definition } 

/C[M] 



and 



torisation theorem given in section 3.1 Before we do this, 
however, we must prove two preliminary results: a rolling 
rule and a fusion rule. Rolling and fusion are central to 
the worker/wrapper transformation [a, [13j, so it is only 
natural that we would need versions of these to apply 
worker /wrapper transformation in this context. 



F h /C[JV] 

= {by definitions } 

SG[y] 
<t> { value beta } 

y 

= { definition } 
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Thus we conclude that F h M > N. Expanding and applying 
garbage collection, we obtain the following: 

let {x = F[x] } in/H[a;] > let y = G[y] in y 

Again we obtain the desired result via weakening and a- 
renaming. As improvement induction is symmetrical, we 
can also prove the following dual fusion law, in which the 
improvement relations are reversed: 

H[/F[x]] <G[/H[x]] 

let {x = ¥[x]} ini[i] g let {x = G[x]} in x 

For both the rolling and fusion rules, we first proved 
a version of the conclusion with normal improvement, and 
then weakened to weak improvement. We do this to avoid 
having to deal with ticks, and because the weaker version is 
strong enough for our purposes. 

Moran and Sands also prove their own fusion law. This 
law requires that the context EI satisfy a form of strictness. 
Specifically, For any value contexts F, G and fresh variable x, 
we have the following implication: 

H[F[x]] > G[H[x]] A strict (H) 

let {x = ¥[x]} inC[H[x]] > let {x = G[x]} in C[x] 

This version of fusion has the advantage of having a stronger 
conclusion, but its strictness side-condition and lack of sym- 
metry make it unsuitable for our purposes. 

5.2 The Worker /Wrapper Improvement Theorem 

Using the above set of rules, we can prove the follow- 
ing worker /wrapper improvement theorem, giving conditions 
under which a program factorisation is a time improvement: 

Theorem 2 (Worker/ Wrapper Improvement). 

Given value contexts Abs, Rep, F, G for which x is free 
satisfying one of the assumptions 

(A) Abs[Rep[a;]] <t> x 

(B) Abs[Rep[F[a]]] |> ¥[x] 

(C) let x= Abs[Rep[F[x]]] inil> let x = ¥[x] in x 

and one of the conditions 

{l)G[x] <3 Rep[F[Abs[x]]] 

(2) G[/Rep[x]] < Rep[/F[x]] 

(3) Abs[/G[x]j «F[/Abs[x]] 

(1,3) let x = G[x] in x <j let x = Rep[F[Abs[x]]] in x 
(2/3) let x = G[x] in x ^ let x — ¥[x] in Rep[x] 

we have the improvement 

let x — ¥[x] in let x — G[x] in Abs[x] 

Given a recursive program let x = ¥[x] in x and abstrac- 
tion and representation contexts Abs and Rep, this theorem 
gives us conditions we can use to derive a factorised program 
let x — G[x] in Abs[x]. This factorised program will be at 
worst a constant factor slower than the original program, but 
can potentially be asymptotically faster. In other words, we 
have conditions that guarantee that such an optimisation is 
"safe" with respect to time performance. 

The proof given in [|25| for the original factorisation 
theorem centers on the use of the rolling and fusion rules. 
Because we have proven analogous rules in our setting, the 
proofs can be adapted fairly straightforwardly, simply by 
keeping the general form of the proofs and using the rules 



of improvement theory as structural rules that fit between 
the original steps. The details are as follows. 

We begin by noting that (A) =>■ (B) =>■ (C), as in the 
original case. The first implication (A) (B) no longer 
follows immediately, but the proof is simple. Leting y be a 
fresh variable, we reason as follows: 

Abs[Rep[F[y]]] 
o { garbage collection, value-/? } 
~let x = ¥[y] in Abs[Rep[x]] 

let x = ¥[y] in x 
<t> { value-/?, garbage collection } 
**[V] 

The final step is to observe that as both x and y are fresh, we 
can substitute one for the other and the relationship between 
the terms will remain the same. Hence, we can conclude (B). 

As in the original theorem, we have that (1) implies 
(1/?) by simple application of substitution, (2) implies (2/3) 
by fusion and (3) implies the conclusion also by fusion. 
Under assumption (C), we have that (1/3) and (2/3) are 
equivalent. We show this by proving their right hand sides 
cost-equivalent, after which we can simply apply transitivity. 

let x = ¥[x] in Rep[x] 
<t> { value-/3 } 

~let x = F[x] in Rep[F[x]] 
<io { value let-floating } 

~Rep[F[let x= F[x] in x]] 
<? {(C)} 

~Rep[F[let x = Abs[Rep[F[:r]]] in x]] 
<t> { value let-floating } 

~let x= Abs[Rep[F[a;]]] in Rep[F[x]] 
<t> { rolling } 

~let x= Rep[F[Abs[x]]] in x 

Finally, we must show that condition (1/3) and assump- 
tion (C) together imply the conclusion. This follows exactly 
the same pattern of reasoning as the original proof, with the 
addition of two applications of value-let floating: 

let x = ¥[x] in x 
!> {(C)} 

~let x = Abs[Rep[F[x]]] in x 
<t> { rolling } 

~let x = Rep[F[Abs[x]]] in Abs[x] 
<H> { value let-floating } 
~Abs[let x= Rep[F[Abs[x]]] in x] 

{(1/3)} 
Abs[let x = G[x] in x] 

<ll> { value let-floating } 

let x = G[x] in Abs[x] 

We conclude this section by discussing a few important 
points about the worker/ wrapper improvement theorem and 
its applications. Firstly, we note that the condition (A) will 
never actually hold. To see this, we let CI be a divergent 
term; that is, one that the abstract machine will never 
finish evaluating. By substituting into the context let x = 
fl in [ — ], we obtain the following cost-equivalence: 

let x = fl in Abs[Rep[x]] <> let x = Q in x 

This is clearly false, as the left-hand side will terminate 
almost immediately (as Abs is a value context), while the 
right-hand side will diverge. Thus we see that assumption 
(A) is impossible to satisfy. We leave it in the theorem 
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for completeness of the analogy with the earlier theorem 



from section 3.1. In situations where (A) would have been 
used with the earlier theory, the weaker assumption (B) 
can always be used instead. As we will see later with the 
examples, frequently only very few properties of the context 
F will be used in the proof of (B). A typed improvement 
theory might allow these properties to be assumed of x 
instead, thus making (A) useful again. 

Secondly, we note the restriction to value contexts. This 
is not actually a particularly severe restriction: for the com- 
mon application of recursively-defined functions, it is fairly 
straightforward to ensure that all contexts be of the form 
Ax — > C. For other applications it may be more difficult to 
find Abs and Rep contexts with the required relationship. 

Finally, we note that only conditions (2) and (3) use nor- 
mal improvement, with all other assumptions and condi- 
tions using the weaker version. This is because weak im- 
provement is not strong enough to permit the use of fusion, 
which these conditions rely on. This makes these conditions 
harder to prove. However, when these conditions are used, 
their strength allows us to narrow down the source of any 
constant-factor slowdown that may take place. 

6. Examples 

6.1 Reversing a List 

In this section we shall demonstrate the utility of our theory 
with two practical examples. We begin by revisiting the 
earlier example of reversing a list. In order to apply our 
theory, we must first write reverse as a recursive let: 

reverse — let {/= Revbody [/]} in/ 
Revbody[A/] = Xxs — > case xs of 
[]->[] 

(y : ys) -> M ys-H- [y] 

The abs and rep functions from before give rise to to the 
following contexts: 

Abs[M] = Xxs ->• M xs [] 

Rep[A/] = Xxs — > Ays — ¥ M xs 4f ys 

We also require some extra theoretical machinery that 
we have yet to introduce. To start with, we must assume 
some rules about the append operation 4f • The following 
associativity rules were proved by Moran and Sands |15| . 

(xs -H- ys) 4f zs > xs 4f (ys -H- zs) 
xs -H- (ys -H- zs) fe (xs 4f ys) -H- zs 

We assume the following identity improvement as well, 
which follows from theorems also proved in []15| : 

[ ] -H- xs t> xs 

We also require the notion of an evaluation context. An 
evaluation context is a context where evaluation is impossi- 
ble unless the hole is filled, and have the following form: 



A x 
case . 



of {< 



Mi} 



E : 



let {x = M} in A 
let{y = M; 

x 0 = A 0 [xi]; 
X! = A 1 [x 2 ]; 



in A[xo] 



Note that a context of this form must have exactly one hole. 
The usefulness of evaluation contexts is that they satisfy 
some special laws. We use the following in this example: 

E[/M] 
<t> { tick floating } 
Vl[M] 

E[case Mof{c x~* ->• Ni}] 
<t> { case floating } 
"case M of {a x\ -¥ K[Ni] } 

E[let {x = M} in AT] 
<t> { let floating } 
"let {x = M} in E[AT] 

We conclude by noting that while the context [ — ] 4f ys is not 
strictly speaking an evaluation context (as the hole is in the 
wrong place), it is cost-equivalent to an evaluation context 
and so also satisfies these laws. The proof is as follows: 

H-H-v* 

= { desugaring } 

(let {xs = [ — ]} in (4f) xs) ys 
<t> { let floating [ — ] ys } 

let {xs = [ — ] } in (-H-) xs ys 
<t> { unfolding 4f } 

let {xs = [ — ] } in 
/ case xs of 

[] -* ys 

(z : zs) — > let {rs — (4f) zs ys} in z : rs 
<t> { desugaring tick and collecting lets } 
let {xs = [ — ] ; 
r = case xs of 

[] -> ys 

(z : zs) —¥ let {rs = (-ff) zs ys} in z : rs 

} in r 

Now we can begin the example proper. We start by 
verifying that Abs and Rep satisfy one of the worker /wrapper 
assumptions. While earlier we used (A) for this example, the 
corresponding assumption for worker/ wrapper improvement 
is unsatisfiable. Thus we instead verify assumption (B). The 
proof is fairly straightforward: 

Abs[Rep[Revbody[/|]] 
= { definitions } 

Xxs — s> (Xxs — > Xys — > Revbody [/] xs 4f ys) xs [] 
<t> { /3-reduction } 

Xxs — > Revbody [f\ xs -ff [] 
= { definition of Revbody } 

Xxs — s> (Xxs — > case xs of 

[]">[] 

(y : ys) ^/ys4f [y]) xs -ff [] 
<> { /3-reduction } 
Axs — > (case xs of 
[]-►[] 

(y : y 8 ) f ys -U- [y]) -H- [] 
<t> { case floating [ — ] -H- [] } 
Axs — > case xs of 

[]-►[]*[] 

(y ■ va) -►(/>*[»])*[] 
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{ associativity is weak cost equivalence } 
~Axs — > case xs of 
[]->[]*[] 

(y ■ vs)->fvs-tt-([v]-n-U) 

<> { evaluating [] 4f [], [y] 4f [] } 
Xxs — > case xs of 

[]->[] 

(y : ys) ^/ys-ff [?/] 
= { definition of revbody } 
Revbody [f\ 

As before, we use condition (2) to derive our G. The deriva- 
tion is somewhat more involved than before, requiring some 
care with the manipulation of ticks. 

Rep [/Revbody [f\ ] 
= { definitions } 
Xxs — > Xys — > 

(/ Xxs — > case xs of 

[]->[] 

(2 : zs) — > f zs 4f [ 2] ) xs 4f ys 
{ float tick out of [ — ] xs -H- 2/s } 
Ass — » Ays — >• 

/((Axs — > case of 

[]->[] 

(2 : zs) f zs [z\] xs ys) 
<> { /3-reduction } 
Axs — > Ays — > / ((case xs of 
[]->[] 

(2 : zs) ->/zs-H- [2]) 4f ys) 
<> { case floating [— ] 4f ys } 
Ass — > Ays — > / (case xs of 
[] -»•[]*»» 

(2 : zs) -> (/2S-H- [z])-H- ys) 

> { associativity and identity of -#■ } 
Axs — s> Ays — >• / (case xs of 

[] -> ys 

(2 : zs) -tfzs-W- ([2] 4f ys)) 

> { evaluating [ y] 4f ys } 
Xxs — > Xys — > / (case xs of 

[] -* ys 

(2 : zs) — > f zs 4f (2 : ys)) 
{ case floating tick (*) } 
Axs — s> Ays — > case xs of 
[] -* Sys 

(2 : zs) — > / (f zs 4f (2 : ys)) 

> { removing a tick } 
Axs — > Ays — ► case xs of 

[] -> ys 

(2 : 2s) — ► / (/2s -ff (2 : ys)) 
{ desugaring } 
Axs — > Ays — > case xs of 

[] -> ys 

(2 : 2s) — > 

/ (let uis = (2 : ys) in 

/ 2S 4f uis) 

0 { /3-expansion } 
Axs — s> Ays — >• case xs of 
[] -> l/s 
(2 : 2s) — > 

/let uis = (2 : ys) in 

(Aas — > X bs — > f as 4f 6s) 2s uis 
<> { tick floating [ — ] zs ws} 



Xxs — > Xys — > case xs of 

[] -> ^ 

(2 : 2s) — >• 

let us = (2 : ys) in 

(/ Aas — > Xbs — > f as 4f 6s) zs ws 
= { definition of Rep } 
Axs — ^ Ays — >• case xs of 

[] -> ys 

(2 : 2s) — >• 

let us = (2 : ys) in 
(/Rep[/]) 2s ws 
= { taking this as our definition of G } 
G[/Rep[,fl] 

The step marked * is valid because /[ — ] is itself an eval- 
uation context, being syntactic sugar for let x = [ — ] in x. 
Thus we have derived a definition of G, from which we create 
the following factorised program: 

reverse = let {rec = G[rec]} in Abs[ rec] 
G[rec] = Axs — > Ays — > case xs of 

[]->ys 

(2 : zs) — !> let ws — (2 : ys) in 

rec zs ws 

Expanding this out, we obtain: 

reverse = let { rec — 
Xxs — > Xys — > case xs of 

[] -s- ys 

(2 : zs) — > let uis = (2 : ys) in 

rec zs ws} 
in Axs — > rec xs [ ] 

The result is an implementation of fast reverse as a recursive 
let. The calculations here have essentially the same structure 
as the correctness proofs, with the addition of some admin- 
istrative steps to do with the manipulation of ticks. 

To illustrate the performance gain, we have graphed 
the performance of the original j~euerse function against 
the optimised version in_Figure H. We used the Criterion 
benchmarking library [|18| with a range of list lengths to 
compare the performance of the two functions The resulting 
graph shows a clear improvement from quadratic time to 
linear. We chose to use relatively small list lengths for our 
graphs, but the trend continues for larger values. 

6.2 Tabulating a Function 

Our second example is that of tabulating a function by 
producing a stream (infinite list) of results. Given a function 
/that takes a natural number as its argument, the tabulate 
function should produce the following result: 

[/0,/l,/2,/3,... 

This function can be implemented in Haskell as follows: 

tabulatef=fO : tabulate (f o (+1)) 

This definition is inefficient, as it requires that the argument 
to / be recalculated for each element of the result stream. 
Essentially, this definition corresponds to the following cal- 
culation, involving a significant amount of repeated work: 

[/0,/(0 + l),/((0 + 1) + l),/(((0 + 1) + 1) + 1), . . . 

We wish to apply the worker/ wrapper technique to im- 
prove the time performance of this program. The first step 
is to write it as a recursive let in our language: 
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Figure 2. Performance comparisons of reverse and tabulate 



tabulate = let {h — F[ft] } in ft 

¥[M] = A/-> let {/ = Ax^ 

let {a/ = x+ 1} in fx 1 } 
in/0 : Mf 

Next, we must devise Abs and Rep contexts. In order to 
avoid the repeated work, we hope to derive a version of the 
tabulate function that takes an additional number argument 
telling it where to "start" from. The following Abs and Rep 
contexts convert between these two versions: 

Abs[M] = A/->M0/ 

Rep[M] = An — > A/ — > let {/ = Ax ^ 

let { x' = x + n } 
in/a/} 

in M / 

Once again, we must introduce some new rules before we 
can derive the factorised program. Firstly, we_require the 
following two variable substitution rules from []15| : 

let {x = y} in C [x] > let { x = y} in C [y] 
let {x = y} in C [y] <r> let {x — y} in C [x] 

Next, we must use some properties of addition. Firstly, we 
have the following identity properties: 

x + 0 <> x 
0 + x <t> x 

We also use the following property, combining associativity 
and commutativity. We shall refer to this as associativity 
of +. Where t is not free in C, we have: 

let { t — x+ y} in 

let {r = t + z} in C [r] 

<> 

let { t — z + y} in 

let {r= x + t} in C [r] 

Finally, we use the fact that sums may be floated out of 
arbitrary contexts. Where z does not occur in C, we have: 

C [let { z = y + x} in M] <> let {z = y + x} in C [M] 



Now we can begin to apply worker/ wrapper. Firstly, we 
verify that Abs and Rep satisfy assumption (B). Again, this 
is relatively straightforward: 

Abs[Rep[F[/i]]] 
= { definitions } 
A/-> (Art -> A/-> let {/ = Ax -)• 

let {x' = x+ n} 
in/a/} 

inF[/t]/)0/ 

<> { /3-reduction } 
"A/^ let {/ = Ax^ 

let {x 1 =x+0] 
in fx'} 

in¥[h]f 
<r> { x + 0 <> x } 
~A/^ let {f= Ax^ 

let {x' = x} 
in /a/} 

lnF[fc]/ 

<> { variable substitution, garbage collection } 
"A/^let {/ = Ax^/x} 

mW[h]f 
= { defintion of F } 
A/-> let {/ = Ax^/x} 
in (A/-> let {/' = Ax^ 

let {x' = x+ l}in/x} 
in/0 : ft/')/ 
<> { /3-reduction } 
~A/^let {/ = Ax^/x} 
in let {/' = Ax -> 

let {x = x+ 1} in/ x'} 
in/Oift/' 
<> { value-/3 on / } 
^A/^ let {/' = Ax^ 

let { x' = x + 1 } in ( Ax — ► / x) x' } 
in (Ax^/x) 0 : hf 
<> { /3-reduction } 
~A/-> let {/' = Ax^ 

let {a/ = x+ 1} in/a/} 
in/0 : ft/' 
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= { definition of F } 
¥[h] 

Now we use condition (2) to derive the new definition of 
tabulate. This requires the use of a number of the properties 
that we presented earlier: 

Rep[/F[ft]] 
= { definitions } 
An^ A/-> let {/ = Ax^ 

let { x — x + n} 
in fx'} 

in let {/' = Ax^ 

let {x" = x+ 1} in fx"} 

in/0 : hf')f 
<[> { tick floating [ — ] / } 
An — > A/— > let {/ = Ax — > 

let { x = x + n } 
in/a;'} 
in/(A/^ let {/' = Xx -> 

let {x" = x + 1} in/x"} 

in/0 : hf')f 
<l> { /3-reduction } 
An — > A/— > let {/ = Ax — > 

let { x = x + n } 
in/x'} 

in/let {/' = Ax^ 

let {x" = x+ 1} in/ x"} 
in/0:ft/' 

{ value- /9 on /, garbage collection } 
"An -> A/-> /let {/' = Ax -> 
let {x' = x + 1 } in 
(/Ax-> 

let {x" = x + n} 
in/x")x'} 
in (/ Ax — > let {x" = x+ n} 
in/x") 0: ft/' 

> { removing ticks, /3-reduction } 
~An^ A/-> /let {/' = Ax^ 

let {x' = x + 1 } in 
let {x" = x' + n} 
in/x"} 
in (let {x" = 0 + n} 
in/x") : ft/' 

{ associativity and identity of + } 
"An -> A/-> /let {/' = Ax -> 
let { n' = n + 1 } in 
let { x" = x + n' } 
in/x"} 
in (let {x" = n} 
in/x") : ft/' 

> { variable substitution, garbage collection } 
~An^ A/-> /let {/' = Ax^ 

let { n' = n + 1 } in 
let { x" = x + n' } 
in/x"} 
in}n:hf 
<$> { value let-floating } 
An — > A/— > /n : 
/let {/' = Ax^ 
let { n' = n + 1 } in 
let { x" = x + ri } 
in fx"} 



in hf 

<t> { sums float } 
An — > A/— > f n : 
let { n — n + 1 } in 
/let {/' = Ax^ 

let { x" = x + ri } 
in/x"} 
in ft/' 

<> { /3-expansion, tick floating } 
\n—>\f—tfn : 
let { ri — n + 1 } in 
(/An-> A/-> let {/' = Ax ->• 

let {x' = x+ n} 
in/x'} 

in hf) ri f 
= { definition of Rep } 
An — > A/— > / n : 
let { ri — n + 1 } in 
(/Rep[ft]) ri f 
= { taking this as our definition of G } 
G[/Rep[ft]] 

Thus we have derived a definition of G, from which we create 
the following factorised program: 

tabulate — let { h = G[ft] } in Abs[ft] 

G[M] = An -)• A/-> / n : let { n' = n + 1 } in M ri f 

This is the samfi^optimised tabulate function that was 
proved correct in |10|, and the proofs here have a similar 
structure to the correctness proofs from that paper, except 
that we have now formalised that the new version of the 
tabulate function is indeed a time improvement of the orig- 
inal version. We note that the proof of (B) is complicated 
by the fact that 77-reduction is not valid in this setting. In 
fact, if we assumed r;-reduction then our proof of (B) here 
could be adapted into a proof of (A). _ 

We demonstrate the performance gain in Figure El again 
based on Criterion benchmarks. This time, we keep the same 
input (in this case the function An — > n * ri), but vary 
how many elements of the result stream we evaluate. Once 
again, we have an improvement from quadratic to linear 
performance, and the trend continues for larger values. 



7. Related Work 

We divide the related work into three sections. Firstly, we 
discuss various approaches to the operational semantics of 
lazy languages. Secondly, we discuss the history of improve- 
ment theory. Finally, we discuss other approaches that have 
been used to formally reason about efficiency. 

7.1 Lazy Operational Semantics 

The notion of call-by-neerL evaluation was first introduced 
in 1971 by Wadsworth |30j. However, the semantics most 
widely regarded as the definition of call-by-need is the natu- 
ral semantics due to Launchbury |1J|, which was later used 
by Sestoft to derive the virtual machine semantics we use 
in this paper |g7|. Ariola, Felleisen, Maraist, Odersky and 
Wadler presented a call-by-need lambda calculus |l|, with 
operational semantics based on reductions between terms in 
the source language. This calculus supports an equational 
theory. However, Moran and Sands showed that this_eaua- 
tional theory is subsumed by weak cost-equivalence |15[. 
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7.2 Improvement Theory 

Improvement theory was originally developed in 1991 by 
Sands and applied in a call- by-name setting. In 1997 
this was generalised to a wide class of call-by-name and call- 
by- value languages, also by Sands |22j . This theory was also 
applicable to a general class of resources, rather than just 
space and time. The theory for lazy languages was developed 
by Moran and Sands for time efficiency |15| and Gustavsson 
and Sands for space efficiency |q, H . Since the last of these 
papers was published in 2001, there does not seem to have 
been much work on improvement theory. We hope that this 
paper can help to regenerate interest in this topic. 

7.3 Formal Reasoning About Efficiency 

Okasaki |l^ uses techniques of amortised cost analysis to 
reason about the asymptotic time complexity of lazy func- 
tional data structures. This is achieved by modifying anal- 
ysis techniques such as the Banker's Method, where the no- 
tion of credit is used to spread out the notional cost of an 
expensive but infrequent operations over more frequent and 
cheaper operations. The key idea in Okasaki's work is to 
invert such techniques to use the notion of debt. This al- 
lows the analyses to deal with the persistence of data struc- 
tures, where the same structure may exist in multiple ver- 
sions at once. While credit may only be spent once, a single 
debt may be paid off multiple times (in different versions 
of the same structure) without risking bankruptcy. These 
techniques have been used to analyse the asymptotic per- 
formance of a number of functional data structures. 

Sansom and Peyton Jones |E4j give a presentation of 
the GHC profiler, which can be used to measure time as 
well as space usage of Haskell programs. In doing so, they 
give a formal cost semantics for GHC Core programs based 
around the notion of cost centres. Cost centres are a way 
of annotating expressions, so that the profiler can indicate 
which parts of the source program cost the most to execute. 
The cost semantics is used as a specification to develop 
a precise profiling framework, as well as to prove various 
properties about cost attribution and verify that certain 
program transformations do not affect the attribution of 
costs, though they may of course reduce cost overall. Cost 
centres arfi_now widely-used in profiling Haskell programs. 

Hope pil applies a technique based on instrumenting an 
abstract machine with cost information to derive a cost se- 
mantics for call-by-value functional programs. More specifi- 
cally, starting from a denotational semantics for the source 
language, one derives an abstract machine for this language 
using standard program transformation techniques, instru- 
ments this machine with cost information, and then reverses 
the derivation to arrive at an instrumented denotational se- 
mantics. This semantics can then be used to reason about 
the cost of programs in the high-level source language with- 
out reference to the details of the abstract machine. This 
approach was used to calculate the space and time cost of a 
range of programming examples, as well as to derive a new 
deforestation theorem for hylomorphisms. 

8. Conclusion 

In this paper, we have shown how improvement theory can 
be used to justify the worker/ wrapper transformation as a 
program optimisation, by formally proving that, under cer- 
tain natural conditions, the transformation is guaranteed 
to preserve or improve time performance. This guarantee 
is with respect to an established operational semantics for 



call-by-need evaluation. We then verified that two examples 
from previous worker/ wrapper papers met the preconditions 
for this performance guarantee, demonstrating the use of our 
theory while also verifying the validity of the examples. This 
work appears to be the first time that rigorous performance 
guarantees have been given for a general purpose optimisa- 
tion technique in a call-by-need setting. 

8.1 Further Work 

As well as for fixed points, worker/ wrapper theories also ex- 
ist for more structured recursion operators such as folds |13| 
and unfolds p. Of . Though the theory we present here can 
be specialised to such operators, it may be beneficial to in- 
vestigate this more closely, as doing so may reveal more 
interesting and subtle details yet to be uncovered. 

As we mentioned earlier in this paper, a typed theory 
would be more useful, allowing more power when reasoning 
about programs. This would also match more closely with 
the original worker/ wrapper theories, which were typed. 
The key barrier to this is that there is currently no typed 
improvement theory, so such a theory would have to be 
developed before the theory here could be made typed. 

The theory we present here only applies to time efficiency. 
Gustavsson and Sands have developed an improvement the- 
ory for space j8|, |9J, so this would be an obvious next step 
for developing our theory. More generally, we could apply a 
technique such as that used by Sands |22j to develop a the- 
ory that applies to a large class of resources, and examine 
which assumptions must be made about the resources we 
consider for our theory to apply. 

Assumptions (A), (B) and (C) are written as weak cost- 
equivalences, which limits the scope of our theory to cases 
where Abs and Rep are fairly simple. We would like to also 
be able to cover cases where the Abs and Rep contexts cor- 
respond to expensive operations, but the extra cost is made 
up for by the overall efficiency gain of the transformation. 
To cover such cases, we would require a richer version of im- 
provement theory that is able to quantify how much better 
one program is than another. 

As our examples show, the calculations required to de- 
rive an improved program can often be quite involved. The 
HERMIT svst.pm devised by a team at the University of 
Kansas |jfj, Eg, facilitates program transformations by pro- 
viding an interactive interface for program transformation 
that verifies correctness. If improvement theory could be in- 
tegrated into such a system, it would be significantly easier 
to apply our worker/ wrapper improvement theory. 

Finally, we are working on a general worker/ wrapper the- 
ory that willj,pply to any operator with the property of 
dinaturality [5]. It is also interesting to consider whether 
such a general categorical approach can be applied to an 
operational theory. If this is the case, dinaturality may also 
provide the necessary machinery to unify the denotational 
(correctness) and operational (efficiency) theories, which as 
we have already observed in this paper are very similar in 
terms of their formulations and proofs. Voigtlander and Jo- 
hann used parametricity to justify program transformations 
from a perspective of observational approximation |29j. It 
may be productive to investigate whether their techniques 
can be applied to a notion of improvement. 
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